AITopics | mental health dataset

Collaborating Authors

mental health dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Comprehensive Review of Datasets for Clinical Mental Health AI Systems

Mandal, Aishik, Adhikary, Prottay Kumar, Arnaout, Hiba, Gurevych, Iryna, Chakraborty, Tanmoy

arXiv.org Artificial IntelligenceAug-19-2025

Mental health disorders are rising worldwide. However, the availability of trained clinicians has not scaled proportionally, leaving many people without adequate or timely support. To bridge this gap, recent studies have shown the promise of Artificial Intelligence (AI) to assist mental health diagnosis, monitoring, and intervention. However, the development of efficient, reliable, and ethical AI to assist clinicians is heavily dependent on high-quality clinical training datasets. Despite growing interest in data curation for training clinical AI assistants, existing datasets largely remain scattered, under-documented, and often inaccessible, hindering the reproducibility, comparability, and generalizability of AI models developed for clinical mental health care. In this paper, we present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants. We categorize these datasets by mental disorders (e.g., depression, schizophrenia), data modalities (e.g., text, speech, physiological signals), task types (e.g., diagnosis prediction, symptom severity estimation, intervention generation), accessibility (public, restricted or private), and sociocultural context (e.g., language and cultural background). Along with these, we also investigate synthetic clinical mental health datasets. Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data. We conclude by outlining key challenges in curating and standardizing future datasets and provide actionable recommendations to facilitate the development of more robust, generalizable, and equitable mental health AI systems.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.09809

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Communications > Social Media (0.93)
(4 more...)

Add feedback

MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models

Racha, Suraj, Joshi, Prashant, Raman, Anshika, Jangid, Nikita, Sharma, Mridul, Ramakrishnan, Ganesh, Punjabi, Nirmal

arXiv.org Artificial IntelligenceFeb-21-2025

Mental health remains a challenging problem all over the world, with issues like depression, anxiety becoming increasingly common. Large Language Models (LLMs) have seen a vast application in healthcare, specifically in answering medical questions. However, there is a lack of standard benchmarking datasets for question answering (QA) in mental health. Our work presents a novel multiple choice dataset, MHQA (Mental Health Question Answering), for benchmarking Language models (LMs). Previous mental health datasets have focused primarily on text classification into specific labels or disorders. MHQA, on the other hand, presents question-answering for mental health focused on four key domains: anxiety, depression, trauma, and obsessive/compulsive issues, with diverse question types, namely, factoid, diagnostic, prognostic, and preventive. We use PubMed abstracts as the primary source for QA. We develop a rigorous pipeline for LLM-based identification of information from abstracts based on various selection criteria and converting it into QA pairs. Further, valid QA pairs are extracted based on post-hoc validation criteria. Overall, our MHQA dataset consists of 2,475 expert-verified gold standard instances called MHQA-gold and ~56.1k pairs pseudo labeled using external medical references. We report F1 scores on different LLMs along with few-shot and supervised fine-tuning experiments, further discussing the insights for the scores.

dataset, language model, mental health, (13 more...)

arXiv.org Artificial Intelligence

2502.15418

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities

Mandal, Aishik, Chakraborty, Tanmoy, Gurevych, Iryna

arXiv.org Artificial IntelligenceFeb-1-2025

Mental illness is a widespread and debilitating condition with substantial societal and personal costs. Traditional diagnostic and treatment approaches, such as self-reported questionnaires and psychotherapy sessions, often impose significant burdens on both patients and clinicians, limiting accessibility and efficiency. Recent advances in Artificial Intelligence (AI), particularly in Natural Language Processing and multimodal techniques, hold great potential for recognizing and addressing conditions such as depression, anxiety, bipolar disorder, schizophrenia, and post-traumatic stress disorder. However, privacy concerns, including the risk of sensitive data leakage from datasets and trained models, remain a critical barrier to deploying these AI systems in real-world clinical settings. These challenges are amplified in multimodal methods, where personal identifiers such as voice and facial data can be misused. This paper presents a critical and comprehensive study of the privacy challenges associated with developing and deploying AI models for mental health. We further prescribe potential solutions, including data anonymization, synthetic data generation, and privacy-preserving model training, to strengthen privacy safeguards in practical applications. Additionally, we discuss evaluation frameworks to assess the privacy-utility trade-offs in these approaches. By addressing these challenges, our work aims to advance the development of reliable, privacy-aware AI tools to support clinical decision-making and improve mental health outcomes.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.00451

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
(43 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Add feedback

Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media

Qi, Hongzhi, Zhao, Qing, Song, Changwei, Zhai, Wei, Luo, Dan, Liu, Shuo, Yu, Yi Jing, Wang, Fan, Zou, Huijing, Yang, Bing Xiang, Li, Jianqiang, Fu, Guanghui

arXiv.org Artificial IntelligenceNov-1-2023

In the realm of social media, users frequently convey personal sentiments, with some potentially indicating cognitive distortions or suicidal tendencies. Timely recognition of such signs is pivotal for effective interventions. In response, we introduce two novel annotated datasets from Chinese social media, focused on cognitive distortions and suicidal risk classification. We propose a comprehensive benchmark using both supervised learning and large language models, especially from the GPT series, to evaluate performance on these datasets. To assess the capabilities of the large language models, we employed three strategies: zero-shot, few-shot, and fine-tuning. Furthermore, we deeply explored and analyzed the performance of these large language models from a psychological perspective, shedding light on their strengths and limitations in identifying and understanding complex human emotions. Our evaluations underscore a performance difference between the two approaches, with the models often challenged by subtle category distinctions. While GPT-4 consistently delivered strong results, GPT-3.5 showed marked improvement in suicide risk classification after fine-tuning. This research is groundbreaking in its evaluation of large language models for Chinese social media tasks, accentuating the models' potential in psychological contexts. All datasets and code are made available.

cognitive distortion and suicidal risk, language model benchmark, mental health dataset, (2 more...)

arXiv.org Artificial Intelligence

2309.03564

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback